11 research outputs found

    Fully Convolutional Neural Networks for Dynamic Object Detection in Grid Maps

    Full text link
    Grid maps are widely used in robotics to represent obstacles in the environment and differentiating dynamic objects from static infrastructure is essential for many practical applications. In this work, we present a methods that uses a deep convolutional neural network (CNN) to infer whether grid cells are covering a moving object or not. Compared to tracking approaches, that use e.g. a particle filter to estimate grid cell velocities and then make a decision for individual grid cells based on this estimate, our approach uses the entire grid map as input image for a CNN that inspects a larger area around each cell and thus takes the structural appearance in the grid map into account to make a decision. Compared to our reference method, our concept yields a performance increase from 83.9% to 97.2%. A runtime optimized version of our approach yields similar improvements with an execution time of just 10 milliseconds.Comment: This is a shorter version of the masters thesis of Florian Piewak and it was accapted at IV 201

    Combining Appearance, Depth and Motion for Efficient Semantic Scene Understanding

    Get PDF
    Computer vision plays a central role in autonomous vehicle technology, because cameras are comparably cheap and capture rich information about the environment. In particular, object classes, i.e. whether a certain object is a pedestrian, cyclist or vehicle can be extracted very well based on image data. Environment perception in urban city centers is a highly challenging computer vision problem, as the environment is very complex and cluttered: road boundaries and markings, traffic signs and lights and many different kinds of objects that can mutually occlude each other need to be detected in real-time. Existing automotive vision systems do not easily scale to these requirements, because every problem or object class is treated independently. Scene labeling on the other hand, which assigns object class information to every pixel in the image, is the most promising approach to avoid this overhead by sharing extracted features across multiple classes. Compared to bounding box detectors, scene labeling additionally provides richer and denser information about the environment. However, most existing scene labeling methods require a large amount of computational resources, which makes them infeasible for real-time in-vehicle applications. In addition, in terms of bandwidth, a dense pixel-level representation is not ideal to transmit the perceived environment to other modules of an autonomous vehicle, such as localization or path planning. This dissertation addresses the scene labeling problem in an automotive context by constructing a scene labeling concept around the "Stixel World" model of Pfeiffer (2011), which compresses dense information about the environment into a set of small "sticks" that stand upright, perpendicular to the ground plane. This work provides the first extension of the existing Stixel formulation that takes into account learned dense pixel-level appearance features. In a second step, Stixels are used as primitive scene elements to build a highly efficient region-level labeling scheme. The last part of this dissertation finally proposes a model that combines both pixel-level and region-level scene labeling into a single model that yields state-of-the-art or better labeling accuracy and can be executed in real-time with typical camera refresh rates. This work further investigates how existing depth information, i.e. from a stereo camera, can help to improve labeling accuracy and reduce runtime

    The Cityscapes Dataset for Semantic Urban Scene Understanding

    Full text link
    Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia

    Combining Appearance, Depth and Motion for Efficient Semantic Scene Understanding

    No full text
    Computer vision plays a central role in autonomous vehicle technology, because cameras are comparably cheap and capture rich information about the environment. In particular, object classes, i.e. whether a certain object is a pedestrian, cyclist or vehicle can be extracted very well based on image data. Environment perception in urban city centers is a highly challenging computer vision problem, as the environment is very complex and cluttered: road boundaries and markings, traffic signs and lights and many different kinds of objects that can mutually occlude each other need to be detected in real-time. Existing automotive vision systems do not easily scale to these requirements, because every problem or object class is treated independently. Scene labeling on the other hand, which assigns object class information to every pixel in the image, is the most promising approach to avoid this overhead by sharing extracted features across multiple classes. Compared to bounding box detectors, scene labeling additionally provides richer and denser information about the environment. However, most existing scene labeling methods require a large amount of computational resources, which makes them infeasible for real-time in-vehicle applications. In addition, in terms of bandwidth, a dense pixel-level representation is not ideal to transmit the perceived environment to other modules of an autonomous vehicle, such as localization or path planning. This dissertation addresses the scene labeling problem in an automotive context by constructing a scene labeling concept around the "Stixel World" model of Pfeiffer (2011), which compresses dense information about the environment into a set of small "sticks" that stand upright, perpendicular to the ground plane. This work provides the first extension of the existing Stixel formulation that takes into account learned dense pixel-level appearance features. In a second step, Stixels are used as primitive scene elements to build a highly efficient region-level labeling scheme. The last part of this dissertation finally proposes a model that combines both pixel-level and region-level scene labeling into a single model that yields state-of-the-art or better labeling accuracy and can be executed in real-time with typical camera refresh rates. This work further investigates how existing depth information, i.e. from a stereo camera, can help to improve labeling accuracy and reduce runtime

    Tree-Structured Models for Efficient Multi-Cue Scene Labeling

    No full text
    We propose a novel approach to semantic scene labeling in urban scenarios, which aims to combine excellent recognition performance with highest levels of computational efficiency. To that end, we exploit efficient tree-structured models on two levels: pixels and superpixels. At the pixel level, we propose to unify pixel labeling and the extraction of semantic texton features within a single architecture, so-called encode-and-classify trees. At the superpixel level, we put forward a multi-cue segmentation tree that groups superpixels at multiple granularities. Through learning, the segmentation tree effectively exploits and aggregates a wide range of complementary information present in the data. A tree-structured CRF is then used to jointly infer the labels of all regions across the tree. Finally, we introduce a novel object-centric evaluation method that specifically addresses the urban setting with its strongly varying object scales. Our experiments demonstrate competitive labeling performance compared to the state of the art, while achieving near real-time frame rates of up to 20 fps

    The antidepressant Sertraline inhibits CatSper Ca<sup>2+</sup> channels in human sperm

    No full text
    Study question: Do selective serotonin reuptake inhibitor (SSRI) antidepressants affect the function of human sperm?Summary answer: The SSRI antidepressant Sertraline (e.g. Zoloft) inhibits the sperm-specific Ca2+ channel CatSper and affects human sperm function in vitro. </p
    corecore